This is the RMarkdown for Exam 3. Below, we will analyze the BioLogData.

First, Let’s take a look at our data

##     Sample.ID Rep Well Dilution                   Substrate Hr_24 Hr_48 Hr_144
## 1 Clear_Creek   1   A1    0.001                       Water 0.000 0.000  0.000
## 2 Clear_Creek   1   A2    0.001       β-Methyl-D- Glucoside 0.004 0.005  0.004
## 3 Clear_Creek   1   A3    0.001 D-Galactonic Acid γ-Lactone 0.008 0.007  0.001
## 4 Clear_Creek   1   A4    0.001                  L-Arginine 0.003 0.002  0.000
## 5 Clear_Creek   1   B1    0.001   Pyruvic Acid Methyl Ester 0.002 0.000  0.007
## 6 Clear_Creek   1   B2    0.001                    D-Xylose 0.011 0.008  0.021

Here’s a breakdown of the columns in the BioLogData data set:

Column ID Description
Sample.ID The location the sample was taken from. There are 2 water samples and 2 soil samples.
Rep The experimental replicate. 3 replicates for each combination of experimental variables.
Well The well number on the BioLog plate.
Dilution The dilution factor of the sample.
Substrate The name of the carbon source in that well. “Water” is the negative control.
Hr_24 The light absorbance value after 24 hours of incubation.
Hr_48 The light absorbance value after 48 hours of incubation.
Hr_144 The light absorbance value after 144 hours of incubation.

## Sample.ID       Rep      Well  Dilution Substrate     Hr_24     Hr_48    Hr_144 
##  "factor" "integer"  "factor" "numeric"  "factor" "numeric" "numeric" "numeric"
##        Sample.ID        Rep         Well        Dilution    
##  Clear_Creek:288   Min.   :1   A1     : 36   Min.   :0.001  
##  Soil_1     :288   1st Qu.:1   A2     : 36   1st Qu.:0.001  
##  Soil_2     :288   Median :2   A3     : 36   Median :0.010  
##  Waste_Water:288   Mean   :2   A4     : 36   Mean   :0.037  
##                    3rd Qu.:3   B1     : 36   3rd Qu.:0.100  
##                    Max.   :3   B2     : 36   Max.   :0.100  
##                                (Other):936                  
##                        Substrate       Hr_24            Hr_48       
##  2-Hydroxy Benzoic Acid     : 36   Min.   :0.0000   Min.   :0.0000  
##  4-Hydroxy Benzoic Acid     : 36   1st Qu.:0.0000   1st Qu.:0.0060  
##  D-Cellobiose               : 36   Median :0.0320   Median :0.2595  
##  D-Galactonic Acid γ-Lactone: 36   Mean   :0.1703   Mean   :0.4691  
##  D-Galacturonic Acid        : 36   3rd Qu.:0.1872   3rd Qu.:0.7220  
##  D-Glucosaminic Acid        : 36   Max.   :2.6500   Max.   :2.7850  
##  (Other)                    :936                                    
##      Hr_144       
##  Min.   :0.00000  
##  1st Qu.:0.04175  
##  Median :0.75200  
##  Mean   :0.92497  
##  3rd Qu.:1.67950  
##  Max.   :3.11600  
## 

Which sample locations are functionally different from each other in terms of what C-substrates they can utilize?

## [1] Clear_Creek Soil_1      Soil_2      Waste_Water
## Levels: Clear_Creek Soil_1 Soil_2 Waste_Water
##  [1] Water                       β-Methyl-D- Glucoside      
##  [3] D-Galactonic Acid γ-Lactone L-Arginine                 
##  [5] Pyruvic Acid Methyl Ester   D-Xylose                   
##  [7] D-Galacturonic Acid         L-Asparganine              
##  [9] Tween 40                    i-Erythitol                
## [11] 2-Hydroxy Benzoic Acid      L-Phenylalanine            
## [13] Tween 80                    D-Mannitol                 
## [15] 4-Hydroxy Benzoic Acid      L-Serine                   
## [17] α-Cyclodextrin              N-Acetyl-D-Glucosamine     
## [19] γ-Hydroxybutyric Acid       L-Threonine                
## [21] Glycogen                    D-Glucosaminic Acid        
## [23] Itaconic Acid               Glycyl-L-Glutamic Acid     
## [25] D-Cellobiose                Glucose-1-Phosphate        
## [27] α-Ketobutyric Acid          Phenylethylamine           
## [29] α-D-Lactose                 D.L -α-Glycerol Phosphate  
## [31] D-Mallic Acid               Putrescine                 
## 32 Levels: 2-Hydroxy Benzoic Acid 4-Hydroxy Benzoic Acid ... γ-Hydroxybutyric Acid

Are Soil and Water samples significantly different overall (as in, overall diversity of usable carbon sources)? What about for individual carbon substrates?

creek <- creek %>% 
  mutate(status="water")
wastewater <- wastewater %>% 
  mutate(status="water")
soil1 <- soil1 %>%
  mutate(status="soil")
soil2 <- soil2 %>%
  mutate(status="soil")

df <- rbind(creek, wastewater, soil1, soil2)
##                    Df Sum Sq Mean Sq F value   Pr(>F)    
## Substrate          31  143.0    4.61  12.484  < 2e-16 ***
## status              1  237.2  237.20 641.878  < 2e-16 ***
## Substrate:status   31   42.1    1.36   3.674 3.55e-11 ***
## Residuals        3392 1253.5    0.37                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

If there are differences between samples, which C-substrates are driving those differences?

Does the dilution factor change any of these answers?

mod2 <- aov(data=creek, values ~ Substrate * Dilution)
plot(creek$values ~ creek$Dilution)

summary(mod2)
##                     Df Sum Sq Mean Sq F value   Pr(>F)    
## Substrate           31  11.01   0.355   2.744 1.56e-06 ***
## Dilution             1  16.75  16.748 129.373  < 2e-16 ***
## Substrate:Dilution  31  17.06   0.550   4.251 4.22e-13 ***
## Residuals          800 103.57   0.129                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
mod3 <- aov(data=wastewater, values ~ Substrate * Dilution)
plot(wastewater$values ~ wastewater$Dilution)

summary(mod3)
##                     Df Sum Sq Mean Sq F value   Pr(>F)    
## Substrate           31  35.51   1.146   5.215  < 2e-16 ***
## Dilution             1  12.74  12.740  58.000 7.40e-14 ***
## Substrate:Dilution  31  18.78   0.606   2.758 1.37e-06 ***
## Residuals          800 175.72   0.220                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
mod4 <- aov(data=soil1, values ~ Substrate * Dilution)
plot(soil1$values ~ soil1$Dilution)

summary(mod4)
##                     Df Sum Sq Mean Sq F value   Pr(>F)    
## Substrate           31   88.1   2.841   5.320  < 2e-16 ***
## Dilution             1   21.4  21.410  40.090 4.04e-10 ***
## Substrate:Dilution  31   10.1   0.325   0.609    0.955    
## Residuals          800  427.2   0.534                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
mod5 <- aov(data=soil2, values ~ Substrate * Dilution)
plot(soil2$values ~ soil2$Dilution)

summary(mod5)
##                     Df Sum Sq Mean Sq F value Pr(>F)    
## Substrate           31   74.6    2.41    5.53 <2e-16 ***
## Dilution             1   54.0   53.99  124.14 <2e-16 ***
## Substrate:Dilution  31   17.4    0.56    1.29  0.136    
## Residuals          800  347.9    0.43                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Do the control samples indicate any contamination?